52 research outputs found

    System combination using machine learning in NLP tasks

    Get PDF
    La combinación de sistemas constituye un área de investigación ampliamente estudiada en el ámbito del Reconocimiento de Patrones, en donde se han desarrollado múltiples técnicas para aprovechar la diversidad de métodos de clasificación de los que se dispone actualmente gracias al Aprendizaje Automático. En el desarrollo de esta Tesis Doctoral se ha realizado un estudio de las técnicas de combinación existentes y su grado de implicación en tareas del PLN. Asimismo se han expuesto algunos trabajos sobre tareas concretas y un estudio comparativo con los resultados arrojados por muchas de estas técnicas implementadas y aplicadas sobre la tarea de etiquetado morfosintáctico. El uso de un gran número de corpus diferentes y los experimentos llevados a cabo nos han permitido extraer algunas conclusiones que creemos de gran utilidad de cara al uso de estas técnicas en el futuro dentro del PLN.The combination of systems is an area of widely studied research in the field of Pattern Recognition, where many techniques have been developed for taking advantage of the diversity of classification methods that are currently available thanks to Machine Learning. During the work implied in this PhD Thesis we have carried out a study of the existing combination techniques and their implication in NLP tasks. Some works on concrete tasks have also been exposed as well as a comparative study with the results obtained by many of these techniques implemented and deployed over the POS-tagging task. By using many different corpora and making many different experiments we have been able to draw some conclusions that can be very helpful for using these techniques in the future inside NLP

    Expansion of domain-specific opinion lexicons using word embeddings

    Get PDF
    En este trabajo abordamos la ampliación de lexicones de opinión específicos de dominio a partir de textos del dominio elegido. El método se basa en la construcción de clasificadores que catalogan las palabras de entrada como positivas, negativas o neutras, y en un criterio estricto de selección de las palabras que pretende garantizar la precisión de las nuevas incorporaciones al lexicón. Se utilizan representaciones continuas de palabras (word embeddings) como espacio de características de los clasificadores. Los resultados confirman que dichas representaciones contienen información relativa a la polaridad de las palabras, obteniéndose una precisión en la selección de los candidatos y en la estimación de su polaridad de alrededor del 94% para los tres dominios analizados, con una cobertura en torno al 50% de las palabras de opinión contenidas en los textos de partida.In this work we present a domain-specific opinion lexicon expansion method. The method is based on classifiers which categorize words as positive, negative or neutral, and a strict selection criteria of words intended to ensure the precision of the new additions to the lexicon. We use word embeddings as the feature space of the classifiers. The results confirm that these representations contain information on the polarity of the words, obtaining a precision in the selection of candidates and the estimation of its polarities of about 94% for the three domains analyzed, covering around 50% of the opinion words contained in the initial texts.Este trabajo ha sido financiado a través del proyecto de investigación AORESCU (P11-TIC-7684 MO)

    Supervised TextRank

    Get PDF
    In this paper we investigate how to adapt the TextRank method to make it work in a supervised way. TextRank is a graph based method that applies the ideas of the ranking algorithm used in Google (PageRank) to Natural Language Processing (NLP) tasks. This approach has given very good results in many NLP tasks like text summarization, keyword extraction or word sense disambiguation. In all these tasks Text- Rank operates in an unsupervised way, without using any training corpus. Our main contribution is the definition of a method that allows to apply TextRank to a graph that includes information generated from a training tagged corpus. We have tested our method with the Part of Speech (POS) tagging task, comparing the results with those obtained with tools specialized in this task. The performance of our system is quite near to these tools, improving the results of two of them when the corpus tagset is big and therefore the tagging task more complicated.Ministerio de Ciencia y Tecnología TIN2004-07246-C03-0

    An approach to the use of word embeddings in an opinion classification task

    Get PDF
    In this paper we show how a vector-based word representation obtained via word2vec can help to im- prove the results of a document classifier based on bags of words. Both models allow obtaining nu- meric representations from texts, but they do it very differently. The bag of words model can representdocuments by means of widely dispersed vectors in which the indices are words or groups of words.word2vec generates word level representations building vectors that are much more compact, where in- dices implicitly contain information about the context of word occurrences. Bags of words are very effec- tive for document classification and in our experiments no representation using only word2vec vectorsis able to improve their results. However, this does not mean that the information provided by word2vecis not useful for the classification task. When this information is used in combination with the bags ofwords, the results are improved, showing its complementarity and its contribution to the task. We havealso performed cross-domain experiments in which word2vec has shown much more stable behaviorthan bag of words models.Junta de Andalucía P11-TIC-7684 M

    Processing of the Results from Educational Forum of the Virtual Courses in order to Analyze them: Making automatic reports from the log files

    Get PDF
    El eLearning, asentado en cada vez más instituciones y empresas, continúa evolucionando y mejorando convirtiendo sus métodos en una herramienta de aprendizaje más flexible y utilizados por un mayor número de profesores. Dentro de la evolución natural de estos métodos ha aparecido la disciplina de las Analíticas del Aprendizaje (LA) y que persigue estructurar y organizar el amplísimo volumen de datos que pueden obtenerse del trabajo realizado en el ámbito educativo que incorpora medios digitales. Se presenta aquí un trabajo cuyo objetivo ha sido el diseño de una pieza de software que a través de un fichero en formato csv extraído de un foro de la plataforma Moodle ofrece información sobre las interacciones entre los estudiantes de un curso, entre otros datos. Está información viene presentada en un informe de texto y en un archivo de formato SQL. Posteriormente con un estudio conveniente de estos informes es posible extraer conclusiones contrastadas del trabajo realizado con estos medios y continuar tratando los datos con otras herramientas.Methodology eLearning improves educational and promotes to share and collaborate. This work presents a way to analyze and automate log files from the educational forums in a course of virtual learning systems. We have developed a piece of software to process the logs files. Text reports and database files are produced. Then it is possible to study them and to obtain conclusions about the work of our students in this context

    Using a business process management system to model dynamic teaching methods

    Get PDF
    Enterprise Information Systems are enjoying an extensive trajectory in the optimization of organizations worldwide, of which predominantly the Business Process Management (BPM) systems stand out for their great flexibility. BPM models describe business workflows and are highly useful in detecting errors and bottlenecks and in identifying possible improvements. On the other hand, educational management software tools offer a large number of functionalities, but have yet to take advantage of these techniques. Our main objective is to perform an empirical analysis in this unexplored area to evaluate the advantages of applying BPM in the implementation of innovative and dynamic teaching activities. Using this methodology, we have designed RubricaSoft, a BPM system focused on providing dynamic educational processes. It automates multiple tasks, including peer evaluation, information integration and the management of deadlines. The results have been very promising from the point of view of the three axes upon which the evaluation has been carried out: satisfaction of students, improvement in academic results and increase in the productivity of teachers. In one of the processes, the time spent by the teacher has been reduced by 80% and student participation increased by 41%.Ministerio de Economía y Competitividad TIN2017- 82113-C2-1-

    Expansion of domain-specific opinion lexicons using word embeddings

    Get PDF
    En este trabajo abordamos la ampliación de lexicones de opinión específicos de dominio a partir de textos del dominio elegido. El método se basa en la construcción de clasificadores que catalogan las palabras de entrada como positivas, negativas o neutras, y en un criterio estricto de selección de las palabras que pretende garantizar la precisión de las nuevas incorporaciones al lexicón. Se utilizan representaciones continuas de palabras (word embeddings) como espacio de características de los clasificadores. Los resultados confirman que dichas representaciones contienen información relativa a la polaridad de las palabras, obteniéndose una precisión en la selección de los candidatos y en la estimación de su polaridad de alrededor del 94% para los tres dominios analizados, con una cobertura en torno al 50% de las palabras de opinión contenidas en los textos de partida.In this work we present a domain-specific opinion lexicon expansion method. The method is based on classifiers which categorize words as positive, negative or neutral, and a strict selection criteria of words intended to ensure the precision of the new additions to the lexicon. We use word embeddings as the feature space of the classifiers. The results confirm that these representations contain information on the polarity of the words, obtaining a precision in the selection of candidates and the estimation of its polarities of about 94 % for the three domains analyzed, covering around 50 % of the opinion words contained in the initial texts.Junta de Andalucía P11-TIC-768

    Clarifying the semantics of value in use cases through Jackson’s Problem Frames

    Get PDF
    Use cases constitute a popular technique to problem analysis, partly due to their focus on thinking in terms of the user needs. However this is not a guarantee for discovering all the subproblems that compose the structure of a given software problem. Moreover, a rigorous application of the technique requires a previous consensus about the meaning of I. Jacobson’s statement “a use case must give a measurable value to a particular actor” (The Rational Edge, March 2003). This paper proposes a particular characterisation of the concept of “value” with the purpose of problem structuring. To this aim we base on the catalogue of frames for real software problems proposed by M. Jackson (Problem Frames, 2001) and we reason about what could be valuable for the user on each problem class. We illustrate our technique with the analysis of a web auction problem

    Enriqueciendo revisiones de usuarios mediante un sistema de extracción de opiniones

    Get PDF
    Web sites based on User-Generated Content (UGC) have a potentially valuable applicability in a number of fields. In this work we carry out a study of the usefulness of these systems from the point of view of detecting the perception expressed by users about services or items. We have compiled and analyzed opinions shared by users on TripAdvisor focusing on two aspects: the structured and the unstructured data. We perform a quantitative and a qualitative analysis of the information extracted by an opinion extraction system from our dataset, being the last one especially interesting since it provides valuable knowledge about the strong and weak points of hotels according to user perceptions, going beyond the structured data. Finally, we provide a study on the complementarity of the knowledge extracted from both, the textual opinions and the structured data, observing a noticeable increment of the amount of information available with the conjunction of both sources.Las webs basadas en el contenido generado por usuarios (UGC) tienen una aplicabilidad potencial en un gran número de campos. En este trabajo realizamos un estudio de la utilidad de estos sistemas para determinar la percepción de los usuarios expresada en sus opiniones sobre productos o servicios. Para ello, hemos compilado y analizado opiniones compartidas por usuarios en TripAdvisor, centrándonos en dos aspectos: el contenido estructurado y el no estructurado. Hemos realizado un análisis cuantitativo y cualitativo de la información extraída por un sistema de minería de opiniones, siendo este último especialmente interesante ya que ofrece información valiosa sobre los puntos fuertes y débiles de los hoteles según la percepción de los usuarios, yendo más allá de la información estructurada. Por último, hemos realizado un estudio de la complementariedad de la información estructurada y la no estructurada, observando un gran incremento de la cantidad de información disponible conjuntando ambas.This work has been partially funded by the research projects AORESCU (P11-TIC-7684, Consejería de Innovación, Ciencia y Empresas, Junta de Andalucía), DOCUS (TIN2011-14726-E, Ministerio de Ciencia e Innovación) and ACOGEUS (TIN2012-38536-C03-02, Ministerio de Economía y Competitividad)

    Two deep learning approaches to forecasting disaggregated freight flows: convolutional and encoder–decoder recurrent

    Get PDF
    Time series forecasting of disaggregated freight flow is a key issue in decision-making by port authorities. For this purpose and to test new deep learning techniques we have selected seven time series of imported goods from Morocco to Spain through the port of Algeciras, and we have tested two forecasting deep neural networks models: dilated causal convolutional and encoder–decoder recurrent. We have experimented with four different granularities for each series: quarterly, monthly, weekly and daily. The results show that our neural network models can manage these raw series without first removing seasonality or trend. We also highlight the ability of neural models to work with a fixed input size of one year, being able to make good predictions using the same input size for all granularities. The two deep learning models have globally improved the benchmarks of the M4 Competition of forecasting. Each neural network model obtains its best results under different circumstances: the recurrent one with daily granularity and intermittent series, and the convolutional one with weekly and monthly granularitie
    corecore